Python's Pandas
library is great for all sorts of data-wrangling tasks. What doesn't come out of the box with Pandas is parallel processing. Here is a simple approach for taking a Pandas DataFrame and a function, and applying the function to chunks of the DataFrame in parallel.
First let's download a dataset
In [40]:
import pandas as pd
import seaborn as sns
df = pd.DataFrame(sns.load_dataset('tips'))
print(df.shape)
print(df.head(3))
Say we wanted to get tip percentage. We can